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Leaders in any complex organisation like the Army are constantly required to 
make decisions intended to improve organizational * performance. Effective 
analysis and decision making by leaders require ai> understanding of orga- 
nizational functioning and the dynamics of organizational change in theory 
anci practice. Research can be designed to assist leaders in better under- 
standing how their organization functions and how they may be improved. 
However, for such research to provide sound guidance to leader;, the methods 
that are employed must be capable of handling the complexities of dynamic 
individual and group interaction. Unfortunately, many of the methods currently 
employed by social scientists are best suited to handling less complex forms 
of data. 

The purpose of this report is to provide researchers with statistical tools 
that will assist them in analyzing complex forms of data. The focus of this 
report is on techniques for estimating meas irement error, using scores that 
are aggregated by group. These scores are useful for evaluating group dynamics 
in organizations as complex as the Army. 
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RELIABILITY ESTIMATION FOR AGGREGATED DATA: APPLICATIONS FOR ORGANIZATIONAL 
RESEARCH , 

BRIEF . 

i " ■ 

Requirement: 

In order to study organizations it is important to be able to measure 
organizational functioning with a minimum of error. The report th&t follows 
provides the statistical' tools necessary to measure the extent of error that 
exists in survey data, and organizational record data. Traditional methods of 
measuring error are either inappropriate or incomplete when applied to organiza- 
tional groups, necessitating the statistical development given here. Appropri- 
ate methods of measuring error are particularly important^ when organizational 
change is being studied. In this case, the same variables' are measured at more 
than one point in time* The investigator wants to identify real organizational 
change. 'However, real change cannot be separated from changes in measurement 
error, unless separate estimates of measurement error are available at each point 
in time. This paper tells how to get separate error estimates so that real 
organizational change can be studied. 




Procedure: 

When research is conducted in an organizational setting, group units of 
analysis are often required. When group units of analysis are used, the values 
of the variables generally consist of mean scores that have been aggregated 
across both survey items and respondents within groups. Analysis of varianoe was 
used here to derive the appropriate reliability formulas for these aggregated 
scores. From the definition of reliability, which involves the ratio of true to 
total variance, formulas' are "derived by finding the mean square components that 
are equivalent to the reliability definition. This requires use of expected mean' 
squares for the unit of analysis term and , ther "error 11 terms. Since the 
aggregated scores typically contain repeated observations across items as well as 
survey respondents, with respondents nested within groups, a split-plot 
(repeated-measures) design can usually describe the structure of the data, with a 
hierarchical structure added also as needed. This split-plot uesign contains two 
" error" terms — a split-plot (within-subjects) error term typically associated 
with inter-item agreement, and a whole plot (between-subjects) erroi* term 
associated with consensus between respondents. Both types of error can enter 
into the reliability formula for aggregated scores, depending on whether survey 
items and respondents are considered to be fixed or randcra, which in turn depends 
on the sampling plan. For example, respondents may be fixed (or partially fixed) 
if the populations of small groups are exhaustively sampled, or nearly so. When 
respondents are fixed, the appropriate reliability formula is not the same as 
when respondents are random. 
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Findings: ? V 

Host of the literature on organizations using -»roup units of' analysis, have 
estimated reliability either Incorrectly or inconsistently. 

The survey construction and" item analysis techniques "?that typically maxi- 
mize inter-item agreement, may tend to reduce consensus between respondents, so 
that surveys like the Survey of Organizations, that^ere ^initially constructed to 
maximize inter-item agreement, may have poor reliability when consensus between 
respondents is desired. 

When studying groups . within organizations, what level of the hierarchy 
should be studied? A statistical technique for estimating the level of the 
heirarchy that actually controls the subject matter at hand is provided. This 
measure can x be used as a guide for selecting groups at appropriate levels of 
heirarchy for study. - 

Utilization of bindings: 

These statistical techniques provide^ improved procedures for studying the 
operation of the Army and other organizations. These techniques are an 
essential prerequisite to more advanced time-series procedures that are needed to 
study organizational change". If Ian investigator wishes to examine real organ^za-^ 
tional change, 'the changes-must take into account changes in measurement errVr. 
Sometimes change appears to i* real but is due solely to changes in measurement 
error. Cha'nge in measurement *error N inst^ad of real ^change can be us&<3 as a 
plausible alternative explanation v for a^ost any set of results involving 
organizational change. If separate estimates of measurement error are available 
at each point in time-, measurement error can Se taken into account. This paper 
►provides the tools needed 'to get Appropriate internal consistency estimates of( 
measurement error, and zo show how these estimates change with time. Once these 
estimates are found, -real organizational ' change , as distinct from changes' in 
measurement accuracy,' can be pinpointed. " m ' 



1 0 viii 



RELIABILITY ESTIMATION FOR AGGREGATED DATA: APPLICATIONS FOR ORGAMZATIONAL 
RESEARCH 

CONTENTS j 

& Page 

ANALYSIS OF VARIANCE , '3 

Model' Statement 3 

Expected Me*n Squares 3 

Sums of Squares 4 

Degrees of Freedom 4 

DATA STRUCTURE 

Overview 1 

Sampling Plans 6 

RELIABILITY FORMULAS 7 

" J ' < 

1 . Derivation '. « 7 

' Interpretation . 12 

* Re^ltiorfShip Between Formulas . 13 

IL_ Reliability for Record Data . ■ 14 

r - 

SIGNIFICANCE TESTS 16 

difference of Reliability from Zero * 16 

Difference Between Reliabilities 16 

SAMPLE SIZE REQUIREMENTS 17 

4 

I 

UNBALANCED. DESIGNS 24 

Effectj on Formulas 24 

Weighting Scores 26 

. x SYNCHRONIZATION MEASURES ... 27 

t v. Making the Measures Comparable 27 

^ ^ Significance., of Difference Between Measures .«•••••• 29 

Removing Synchronization 31 

COMPUTATIONAL REQUIREMENTS ^ 31 

SUMMARY 32 

REFERENCES '. 33 

ix 

11 

ERIC 



TABLES Page 

Table 1 "Balanced Expected Mean Squares with Fixed/Randcm 

Subjects (S) and Items (Q) . . v^"« ^ 0 

2 Reliability Formulas for Mean Scores as a Function of 

Unit of Analysis and Sampling Plan 10 



8 Significance of Differences Between Synchronization Measures, 



18 



3 Statistical Significance of Reliability Coefficients 

4 Reliability Formulas for Single Scores as a Function of 

Unit of Analyses and Sampling Plan * • • 20 

5 Formulas for Determining Sample-Size Requirements 

from Pretest Data 22 

6 Unbalanceu Expected Mean Squares 25 

7 Syncnronization Measures for Determining the 

Unit , of Analysis 28 



30 



12 

X 

ERIC 



RELIABILITY ESTIMATION FOR AGGREGATED DATA: 
APPLICATIONS FOR ORGANIZATIONAL RESEARCH 



With the growth of organ! za tional development over the last twenty years 
there has been an increase in field research on the functioning of intact 
organizations (Porras, 1979) r Such field research has obvious advantages over 
laboratory research in terms of the possibilities for external validity, but at 
the same time res ear ohers>^jjr king with intact organization! face a variety of 
methodological questions that have not been satisfactorily answered to date. 

One very basic question involves the selection of the unit of analysis for 
the research design. Individuals are not the appropriate unit of analysis to 
test most hypotheses about group functioning. When individuals are not appropri- 
ate units, which of many possible groups, at what level of the organizational 
, hierarchy should be selected? The answer will be suggested by the hypotheses and 
organizational structure. The researcher wishes to select units that are 
responsible for and have control over the dependent variables. "-While organiza- 
tional structure and the hypotheses msy suggest which groups at what hierarchical 
level control particular variabl e , and thus provide an appropriate unit of 
ana" ysis , the researcher has no waj * test this hypothesis to find out if in fact 
grouoo at one level of the hierar j provide a better unit of analysis- than . 
groups at another level. In princxpio, if groups at one level of the hierarchy 
•re responsible for and have control over particular dependent variables, then we 
should find homogeneity within and heterogeneity between the independently 
operating groups on the dependent measures (see Jones & Jones, 1975J Bass, 
Valenzis, Farrow, & Solomon, 1975). This phenomenon will be called the principle 
of synchronization, and will be used later to show how to select appropriate 
units of analysis.' — 

Evidence that researchers in the field are having trouble selecting units of 
analysis is suggested by the inconsistency with which a particular unit of 
analysis is used. Once a given unit of analysis is selected, this same unit 
should be used for stating hypotheses, calculating reliabilities and norms (when 
survey feedback i3 involved), estimating validity, and generalizing to new 
populations. A common problem is for researchers to state hypotheses and 
generalizations in terms of intact organizational groups , but to calculate 
reliabilities and estimate validity using individuals (see Bowers, 1973; also 
Passmore, 1976, and Torbert, 1973 for a critique of inconsistent use of units of 
analysis). The researcher may estimate validity with groups but calculate 
reliabilities using individuals (see Taylor & Bowers, 1972, p. 54 for alternation 
between using groups and individuals in calculating reliabilities). 

The researcher who tries to use units of analysis consistently by computing 
reliabilities on the appropriate group units, faces difficulties since an 
ade^uSte outline of procedures for estimating reliability on aggregated scores 
does not exist. Survey responses are aggregated across both items and respon- 
dents within each group to produce the dependent variable scores. The sources of 
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true and error variance differ in these aggregated scores from the same sources 
of variance in individual level scores, since the structure of the data differs 
in the two cases, and for this reason the formulas for estimating reliability on 
aggregated scores can differ from the common formulas used with individuals. 
Some researchers have looked at inter-item agreement, and others at agreement 
between respondents within groups, but none have examined both sources of 
agreement in an integrated way. Researchers have looked ^t inter-item agreement 
by computing, for example, Cronbach's alpha on either individuals or on data 
aggregated over the unit of analysis for each item (see Taylor & Bowers, 1972); 
and at agreement between respondents by using either a variation of the intra- 
class correlation (see J one? & Jones, 1977; Ebel , 1951; Bass et al., 1975) or an 
iterative jacknife procedure (Schneider, 1972; Schneider & Bartlett, 1970). 

Estimates of construct validity (Cronbach and Meehl , 1955) are in many cases 
dependent upon adequate measures of the reliability of the variables involved. 
Construct validity consists of hypotheses that make up noraological networks of 
expected relationships. The expected relationships involve expectations about 
differential levels of association among variables. Differential levels of 
association are frequently studied using regression or path analyses, or cross- 
lagged correlation analysis (see Kenny, 1975). Statistics that measure degrees 
of association among variables are a function of the variables' reliability as 
well as the degree of association in the population (McNemar, 1969, p. 163). Any 
attempt to measure differential levels of association must control for differen - 
tial levels o r reliability, or demonstrate that differential levels of reliabili - 
ty don't exist (Kenny, 1975; Jttreskog & Sttrbom, 1979, chap. 4). Failure to 
calculate reliabilities provides alternate explanations for any set of results. 
In this sense, it is not possible to establish construct validity without taking 
into account measurement error flrrt, no matter what method of analysis is used — 
regression, path, or cross'- lagged panel correlation. In this way estimation of 
validity is dependent on the measurement of reliability. 

The purpose here, then, is (a) to provide criteria for selecting appropriate 
units of analysis within intact organizations, and (b) to provide the appropriate 
procedures for calculating Internal consistency reliabilities on the aggregated 
group scores. These internal consistency reliabilities are especially Important 
in studies of organizational change. They can be used to identify poss ible 
reliability shifts over time. Real organizational changes can then be separated 
from changes in measurement error. 

An important advantage cf using group units over the common approach of 
using individuals, is that it allows the researcher to study the nature of the 
social interaction that occurs between subgroups within the unit — between blacks 
and whites, superiors and subordinates, parents and children— in a way that is 
not possible when individuals alone are the unit (see Hart, 1978, to illustrate 
this application). This is an advantage that has not been recognized, even by 
researchers with appropriate group data (see Taylor & Bowers, 1972). The 
structure of the data that allows interaction to be studied will be illustrated. 
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Analysis of Variance 



Analysis of variance can be used both for reliability estimation (see Winer 
1971, pp. 283-296; Myers, 1966, pp. 29^-299; Ebel, 1951) and estimation of 
synchronization for selection of units of analysis. The model statements used 
with aggregated data can be complex, involving many terms that may vary from 
design to design. For this reason an analysis of variance algorithim is given 
below, for balance designs, that is more parsimonious than that provided by many 
commonly-used texts (e.g., Winer, .1971, pp. 371-375), to assist the reader with 
subsequent material and to clarify terminology and notation that is not complete- 
ly standard. 



Model Statement 

Main effect terms are identified by a single alpha character in caps. 
Nested relationships, if any, are identified by additional alpha characters in 
brackets next to the term in question, showing what this term is nested within. 
Interactions are denoted by two or more alpha characters identifying the inter- 
esting main effects. The full rank model includes interactions between all 
combinations of terms, excluding, however, interactions between any terms that 
share a common alpha character. Terms are ordered by examining the alpha 
characters denoting terms. If the alpha characters of one term are a subset of 
the characters of another, the term that is a subset must be placed ahead of the 
other. Nonnested main effect terms with a greater number of other terms nested 
within them are listed ahead of the nonnested main effects with fewer other terms 
nested within them. 



Expected Mean Squares 

Expected mean squares ( EMS ) identify how mean squares aro divided into the 
various components that contribute to the- makeup of the mean square. Since 
expected mean squares are essential for deriving reliability formulas, the 
following algorithm can be used to derive expected mean squares in the balanced 



This algorithm, in similar form but with different notation, should be attribu- 
ted, to the author's knowledge, to Dr. Melvin Carter , Department of Statistics, 
Brigham Young University. 

3 
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case. To see whether the variance components fcr other terms occur in the 
expected mean squares for the term in question, the alpha characters of the term 
in question are examined in relation to the alpha characters of the other terms. 
If the term in question is a subset of another term, then -the complement of the 
characters is taken. If all of the nonbracketed characters belonging to this 
complement designate random factors, then the variance component for this other 
term does occur in the expected mean squares. The coefficient for this variance 
component, that occurs in the expected mean squares, is found by i in ding the 
alpha characters not listed as part of the term. Th*. product of the levels of the 
main effect terms not listed in this way equals the coefficient. 

Sums of Squares 

The sums of squares for any balanced complete -block design, can be readily 
obtained by: (a) taking the sum over levels of main effects not listed, for the 
term in question; (b) next squaring and then summing over levels of main effects 
that are listed; and finally, (c) this sum is then divided by the product of 
leveliTf main effects not listed. Then the sum of squares for the term in 
question is obtained by subtracting all sums of squares of terms that are subsets 
of the term in question. This includes the y term. 

Degrees of Freedom 

Degrees of freedom for each term are obtained by taking the product of the 
levels of the main effects that are listed for the term in question, and then 
subtracting the degrees of freedom of all terms that are subsets of the term in 
question. Again this includes the y term. 

Data Structure 



Overvl ew 

Reliability estimation is dependent upon specifyin/3 the structure of the 
data, which can be identified with an analysis of variance model statement. The 
following analysis of variance model statement illustrates the type of structure 
frequently encountered with survey data taken from intact organizational groups. 
The model statement is used to describe U.S. Array organization, but could equally 
fit most organizations, and is used as an example throughout the paper. 
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Y = y + A + B( A) + C(AB) + R + AR + BR (A) + CR(AB) + S(ABCR) + Q + AQ + 
BQ( A) + CQ(AB) + RQ + ARQ + BRQ( A) + CRQ( AB) + SQ( ABCR ) + E( ABCRSQ ) 



brigade , random 
battalion, fixed 

company, fixed (oxcept where explicitly specified as random) 
race, fixed 

subjects, fixed or random 
questionnaire items, fixed or random 
error, random 

An Army company consists of approximately 150 soldiers who work together. 
There are five companies within a battalion and three battalions within a 
brigade. The hierarchical nature of the organization is specified by the 
completely- nested hierarchical portion of the design (A, B, and C). Assuming 
enough units were available, either brigades, battalions or companies could be 
selected as the unit of analysis. Nesting any number of hierarchical levels is 
possible. The hierarchical data structure is a very general one that can be 
applied to most organizations in many societies. It can apply also to genera- 
tional hierarchies in groups organized along familial lines. Mixed hierarchies 
can also be examined with families nested within the parental occupational 
organization (s). 

Following the hierarchical part of the design, the term Race (R) appears, 
which crosses the hierarchical groups (i.e., it is not nested- within them). This 
crossed term, whether it designates a variable like race (black-white) , or rank 
(supervisor-subordinate), or even generation (parent-child), designates sub- 
groups that represent repeated measurements across the unit of analysis (e.g., 
companies, families). Repeated measurements across the unit of analysis can be 
used to examine the interaction between the svbgroups that are repeated, by 
correlating the responses of the subgroup across the units, and when available, 
across time using cross-lagged panel correlation or path analysis (see Hart, 
197<3). Interaction between subgroups can be examined over time in this manner. 
In addition to the single- crossed term Race (R) , other crossed terms designating \ 
subgroups with their associated interaction terms are possible, as well "as 
covariates without interactions. 

The term representing Questionnaire items (Q) is crossed with both the 
nested Subjects term (S) and^ the hierarchical terms (A, B, C) , which means 
questionnaire items can be considered repeated measures in two ways — across both 
subjects and the unit of analysis (A, B or C). Just one such term is expected, 
representing survey items. Succeeding terms represent interactions with Q. Data 
that is repeated in both ways contain common-method variance (see Campbell & 
Fiske, 1959) not found in data repeated only across the unit of analysis, so that 

5 



where , A = 1 , a; 

B = 1 , b ; 

C = 1 , c 

R = 1, r 

S = 1, s 

q = 1, a 

E = 1, 1 



correlations between variables that are repeated in both ways should be infxated 
in relation to correlations based on data that is repeated only across the unit 
of analysis and not across subjects. Data that is repeated in two ways is 
represented by the ratings of a single subgroup, within ,the unit of analysis on 
two different scales, while data that is repeated in only one way is represented 
by ratings from two different subgroups on two different scales. Methods of 
reliability estimation that use the commonality between all variables in an 
analysis (see Kenny, 1975, pp. 897-899; Joreskog & Sorbom, 1979, chap. 4) are 
not appropriate for data structures, as above, in which correlations are influ- 
enced by whether the variable is "repeated" in more than one way. Internal 
consistency reliabilities are preferable with the above data structure. 

Overall, the model can be considered a hierarchical split-plot (or 'repeated- 
measures) design. The Q term and interactions with Q represent Withln-Subjects 
variance, while the hierarchical and crossed terms with their interactions 
represent Behavior-Subjects variance, as found in a split-plot (repeated- 
measures) design. The between subjects variance can be further divided into, 
two parts— the hierarchical part representing Between-Groups variance, and the 
crossed term(s) with their interactions representing Wlthln-Groups variance- 
thus creating the hierarchical split-plot design. Analysis of variance designs 
like the above generally have more than one error term. For example, the terra 5g 
can be considered an appropriate error term to test wi thin-subjects terms, and S 
an error to test between- subjects terms. Furthermore, the hierarchical terms C, 
and B might be considered error terms under some circumstances. Error terms are 
dictlted not only by the model but also by the terms considered fixed and random. 
The determination. of whether a term is fixed or random depends on the sampling 
plan of the design. 



Sampling Plans 

In the previous model statement, Brigades (A) may have been sampled in a 
random or at least representative fashion, while Battalions (B) and Companies (C) 
may have been sampled in an exhaustive fashion. Brigades may therefore be random 
while battalions and companies within brigades are fixed since the population of 
these units was exhaustively sampled. In the preceding example the nested 
hierarchical terms B and C were fixed, but in rare cases such terms could be 
randan. For example, if countries were used as a unit of analysis , and in the 
sampling plan cities were randomly selected to represent countries, with subjects 
randomly selected within cities, the nested-hierarchical term, cities, could be 
rancom as well as subjects. 

The Subjects term (S) in the previous example, nested within Companies (£) 
and Race (R) , will be considered fixed or random depending on how exhaustively 
the population of subjects within companies is sampled. The subjects term is 
fixed when all soldiers (approximately 150) are sampled, and random when a very 
small fraction of the company population is sampled. The fixed-randan distinc- 
tion is determined by the sampling fraction (s/N, sample size over popuation 



size), with terms fixed when the ratio is one and random when the ratio is zero. 
In practice, the subjects terms often will be neither fixed nor random. The 
company populations are quite small and itls not unusual at all for a sampling 
plan to call for sampling a fraction of the population (e.g., 1/3) that 
approaches neither one nor zero. In those cases, the subjects term will be 
labeled semirandom . The Questionnaire items (Q) may likewise be considered 
random if the items in the survey are considered a random selection of a 
potentially infinite population of iteirs measuring the same concept, or fixed if 
t.he items are considered to exhaust the population of interest. 

Subjects could be considered random or semirandom and items fixed in a. 
cross-lagged correlation design using groups as the unit of analysis (see Hart, 
,1978). In this design, a sample of subjects within companies can be selected to 
represent the whole company population, so subjects are random or semirandom. 
Cross-lagged correlation looks at time-related changes assuming stationarity — 
constant item structure over time (Kenny, 1975). In such cases it may often be 
reasonable to assume items are fixed when looking at time-related changes in this 
way. Likewise, subjects can be considered fixed and items random in most single- 
time, survey-feedback designs. In this case, entire company populations are 
frequently sampled, while items are considered a sample of a larger conceptual 
population. In this sampling plan subjects become fixed and items random. Of 
course, in many designs both subjects and items may be random or at least 
semirandom. 



Reliability Formulas 



Derivation 

The sampling plans given above have a direct impact on the appropriate 
reliability formulas. A requirement for measuring reliability is to divide the 
variance associated with the unit of analysis into true and error components. 
The unit of analysis in this case is an aggregated group score instead of an 
irdividual response. If the unit of analysis is the Companies term (C), the 
expected mean squares for this term show the underlying components that are 
expected in the make-up of the observed mean square. These anderlying components 
can be divided into true and error variance. This provides a way of allocating 
the observed company mean square into true and error components. The sampling 
plan determines which terms are fixed and random. This in turn affects the 
expected mean squares for the unit of analysis and the allocation of true and 
error components to the observed mean square, which then affects the reliability 
formula. Table 1 shows how the expected mean squares in the balanced case 
change, for selected terms, as a function of whether Subjects (S) and Question- 
naire items (Q) are considered fixed or random. Reliability is defined as the 
ratio of true to total variance. The variance component defined as true variance 
is always that component associated with the unit of analysis— in this case 
either Companies (C) , Battalions (B) , or Brigades (A) . As indicated by Table 1 
there is more than one "error" term when both items and subjects are random. In 



Table 1 

Balanced Expected Mean Squares with Fixed/Random 
Subjects (S) and Items (Q) 1 



Term Expected Mean Squares 



A brigade 


bcrsqa* + 


CaPfi) 


+ 


(bcrsa* Q ) 


+ 








B(A) battalion 


crsqa* + 




+ 




+ 








C(AB) company 


• 


(*# 


* 

+ 




+ 








S(ABCR) subjects 






+ 












AQ brigade X items 








bcrsa* Q 


+ 






°I 


BQ(A) battalion X items 








S£2. a BQ 








"I 


CQ(AB) company X items 








^ a CQ 


+ 








SQ(ABCR) subjects X items 












°SQ 







1 The model and notation are found in the text (see Equation 1). The term A is 
random with B and C fixed. Subjects (S) and Questionnaire Items (Q) are either 
fixed or random. Lower case letters denote the number of levels of the 
corresponding factors in caps. 

2 When subjects are fixed, terms within brackets are deleted. When question- 
naire items are fixed, terms within parentheses are deleted. 
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general, as the number of main effects following the unit of analysis, that are 
random, increase, the number of components considered to be error increase 
dramatically, (see Formula 11, Table 2), 

Reliability for the group mean scores is formally defined in Table 2. The 
expected mean squares , shown in Table 1 , for the unit of analysis (C) , are divided 
by rsa, the product of the levels that are added to obtain the group means. The 
divided expected mean squares represent the components expected in the group 
means, components that vary according to the sampling plan. The component due to 
the unit of analysis (C) , divided by all components, represents the ratio of true 
over total variance needed for the reliability definition. Mean square terms are 
set equal to the corresponding expected mean squares, and then the equations are 
solved for the variance components. For example, the, variance components for 
definition 3 in Table 2 equal: 

0 2 c = (MS^ - MS S ) / rsa; qa* + a| = MSg. 

The mean square estimates of the variance components are substituted for the 
corresponding variance component in the reliability definition, and then simpli- 
fied algebraically. This process produced the reliability formulas in Table 2. 

The unit of analysis for Formulas (3) through (10) is Companies (C). When 
the unit is Battalions (B) or Brigades (A), the definitions and reliability 
formulas are the same as in Table 2, with the Tollowing substitutions: 
(a) a 2 c becomes a*, or oj; (b) a£ Q becomes a* Q , ora^; (c) MS^ becomes MSg, or 

MS ; and (d) MS_I" becomes MS OA "or MS. ft . When the unit of analysis is Battalions 
— A — CQ — ow , flw 

(B), the terms including B are substituted, and when the unit is Brigades (A), A 
is substituted. The error terms in the denominator of the reliability defini- 
tions are divided by an additional coefficient « for Battalions and be for 
Brigade? • • • 

Estimating reliability involves estimating ratios of variance components. 
The expectation of these ratios contains a slight positive bias. Winer (1971, 
pp. 248-249; 282-2'>0) has given a correction for this bias for the standard 
formulas (Formula 2, Table 2; Formula 26, Table 4). This correction, when 
extended to any of the formulas in Table 2, hao the following form: 

tgunlt - te«,ror /( garro- " £) ^error^ (12) 

where, lt is the mean square for the unit of analysis, MS errQr represer' , the 
mean square term(s) measuring error. The term(s) subtracted from MS^ in the 

numerator of the formulas in Table 2 are o 'ror. In words, the correction 

involves multiplying MS by a correction term thr>t approaches one as the 

error* 

degrees of freedom for error increase. When MS Q rrQr involves more than one mean 
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T.bl. 2 



MlLblXH, Fo^ulM for He.n Soon.. - • Function of Unit of AMly.lt .nd S»pUn. Pl.n 



Unit of Analysis Stapling PUn 



HeUtbiXitjr Definition 



roniula 



I teas random 
Subjaota rardoa 

Km* fist* 
Subjaots randca) 

Ituws rixtd 
Subjaota Mirandoa 

lt*aa randc* 
Jubjaota Tixad 

Itea* a«ir*ndoai 
Subjeota fixad 

Ite»a rtndos 
Subjeota randoa 

Xtaaa ftndoa 
Subjaota ■«Mlrai)dOft 



•J ♦ (Ojj ♦ .* )/i 



oj ♦ (30* ♦ o* )/rsa 



o' ♦ (rsp^ ♦ o*)/rsa 



K3. 



«5c - 



»3„ 



K3^ 



(2) 



(3) 



(5) 



(6) 



(7) 



(8) 
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Tabla 2 (continued) 



Unit of Analysis 


Stapling Plan 


-i 

Reliability Definition 




Foraula 




Nu*b«r 


Coa panics 

<£> 


T k. « ■•'•■1 PAhdtlA 


— „ * —a - - - 








(9) 


Subjeota randan 












Conpanlaa 

<C) 


Ittas aealrandoa 
Subjeota w eairandoa 


°c ♦ i^j °s /!:i * a/|! a °k' a 


!%- 






♦ MS,. ) 
(10) 




1% 








Battalion 
(B) 


Iteas randoa 










(11) 


Subjeota randoa 




♦°k* 









,„ ,., I,. L-r « ...««. ~- - <»> - — ' " »~ - - «""<"°" - «— ' **" " ' • 

(*) and \6) aaauaa o|g - 0, ao that MS^ • o£. 
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square term, the adjusted degrees »of freedom for these several terms are found by 

referring to Formula given later. For all practical purposes the positive 

bias in the reliability formulas in Table 2 is negligible with as many degrees of 

freedom for MS as is customary with organizational surveys, 
"•-error • „ 

Another bias may be more serious. As with any analysis of variance^design, 
if significant terms are omitted from ihe model statement, these omitted terms 
will artificially inflate MS . Reliability will be underestimated to the 

extent significant terms are omitted fromN^he model statement. For example, * 
omitting Race (R) when it, or its interactions, are significant, increases the 
size of MS Q . "it is desirable to specify >podel statements that capture the 

S V.« 

structure of the data as completely as possible even if this creates model 
statements with large numbers of terms. 



Interpretation 

The reliabilities are internal consistency measures of reliability. As such 
they represent reliability at any one discrete point in time. At this point in 
time the reliabilities measure the extent to which the researcher would expect to * 
obtain the same thing if the measurement process were repeated. They estimate 
the correlation between the mean scores, for the unit of analysis, and another 
set of mean scores that would be expected if the measurement process had been 
repeated at the same time. The reliability would also be considered an estimate 
of the correlation between the observed sample means and the means that would 
have been obtained if the entire population of subjects/items had'been measured. 

The sampling plans differ for different reliability formulas. Sampling is 
conducted without replacement (i.e., no respondent takes the survey twid'e^at one 
-ime) which creates the practical effect of sampling from a population that can 
be considered finite. When subjects are fixed, the "observations" that make up 
the variation due to subjects a~, remain the same in the hypothetical new sample 

as they were in the observed ""sample, and when subjects are semirandom the 
proportion of these elements in each group that remain the same equals s / Ng 

(sample over population size). Likewise, when items are fixed, the "observa- 
tions" due to the component o£q are identical in the observed and hypothetical 

new sample, and in the semirandom case the proportion of elements that are the 
same equals £ / N . When the sample size equals the population size (i.e., the 

term is fixed), the same scores are selected twice, the mean scores are measured 
without error, and the reliability is perfect. When a term is semirandom, the 
hypothetical new sample .will contain n / N elements in common with the old sample 
and the population. When a term is random, none of the elements that make up that 
component remains the same in the new .sample or population. Declaring a. term 
fixed or random, then, is the same as assuming the elements that go into a 
particular variance component either change or do not change from the observed 
sample to^a hypothetical new one or to the population. They do not change if the 
sample size equals- the population size. 
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Relationship Between Formulas 

In fact, there is a close connection between average intercorrelation, and 
reliability as computed by Cron bach's alpha, and analysis of variance. 
Cronbach's alpha is identical to the Spearman-Brown prediction formula applied to 
the average intercorrelation between items (see Ebel, 1951). Formula 1 in f 
Table 2 differs from Cronbaeh's alpha only in that analysis of variance, with its v 
attendant assumptions, is used to estimate the average intercorrelation between 
items (see Formula 26, Table 4). This estimate of the average intercorrelation 
(Formula 26), when corrected by the Spearman-Brown prediction formula, equals 
Formula 2. 

When computing reliability for aggregated scales researchers typically 
compute Cronbach's alpha on group means, computed separately for each item, which 
is the same as computing the average intercorrelation between these item means, 
and adjusting, the average correlation with the Spearman-Brown prediction formu- 
la. This is closely approximated by Formula 5, Table 2. Th« average inter- 
correlation between company mean scores for each item is estimated by Formula 27, 
Table 4. When this analysis of variance estimate of the average intercorrelation 
is corrected by the Spearman-Brown prediction formula it equals Formula 5. The 4 
use of Cronbach^s alpha to estimate the reliability of group mean scores requires 
the same sampling assumptions as does Formula 5 — subjects fixed and items random. 
When subjects are sampled from large intact organizational groups, Formula 5 is 
not appropriate and neither is Cron bach's alpha. For* example, Taylor and Sowers 
(1972) used Cronbach's alpha both on exhaustive and ten percent samples of 
subjects. Formula 5 should have given way to Formula 8 with the ten percent 
sample if , the assumption of random items had been made. 

A comparison of Formulas (2) and (3) , Table 2, shows an interesting 
relationship between variance components. When individuals are used as the unit 
of analysis, the between subject' variance a* represents true variance,kut when 

companies are the unit, and subjects are random, as in Formula 3i the tern's a* 
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represents error variance. It is true that the subjects components o* are not 
identical in the two cases since the models differ, bvt they are very ^ similar. 
The subjects mean square (MSg) in Formula 3 has been reduced compared to the 
subjects mean square MSg in" Formula 2, to the extent that other "between 
subjects" terms from the model in Equation 1 are significant but otherwise the 
terms are the same. Maximizing the variance between subjects will i^rease 
renability as measured by Formula 2, but can decrease it as measured ^by 
Formula 3. In constructing the Survey of Organizations (see Taylor & Bowers 
1972), "between subjects" variance was maximized by such techniques as a) 
positive wording of all questions, (b) contiguous placement of items from the 
same scale, (c) positive response alternatives lined up on the same side of the 
scale, and (d) selection of items with large "between subjects" ^ributions. 
The-e techniques will maximize reliability as measured by Formula 2. The 
t£hn"ues seem to maximize subject differences by increasing variance 
Sue to response sets. If this is the case, this subject variance would be 
expect 3 d to inflate MSg as error in Formula 3- It is possible that these 
tec! Piques also reduce o| so it may not always increase MSg as error. In 
Formula 3 we wish to maximize MS^ in relation to MSg. The preceding technique 
used in Survey of Organizations could easily, but not necessarily, increase MSg 
in relation to MS^ reducing reliability. Since the Survey of Organizations 
and others like it, use intact organizational groups as units, Formula 3 rather 
tlU 2 is most appropriate and srould be used when sub jeets alone are random. 

Formulas 2 and 5 have generally been used to establish reliability for 
organizational surveys. It should be apparent from Table 2 *° 
necessary relationship between reliability as measured by Formula and. 
Furthermore, there may sometimes be a negative relationship between reliability 
as measSred'by Formula 2 and 3. Organizational Surveys that claim to have well 
established reliabilities, using Formulas 2 or 5, have not esta b lab ad reli bill- 
ty at all for the situations in which Formulas 3, 4, 7, 8, 9 or 10 are most 
appropriate. In fact, it is reasonable to suppose that many of these well 
esUbnshed reliabilities" will not prove to be reliable at all as measured by 
Formula 3, since no attempt nas been made, using pretest samples to select items 
that discriminate well between group units, while a corresponding effort has been 
made to find items that have high intercorrelations. It is important to find 
which scales are in fact reliable using appropriate formulas. Research in this 
direction may require a reassessment of the reliabilities of the scales used in 
organizational research, as well as interpretations of results in this area. 



Reliability for Record Data 

Frequently variables representing group units of analysis are not measured 
by survey but can be found in the form of frequency counts of events within the 
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group that occurred during a given time period. Often these frequency counts are 
expressed in the form of rates (e.g., per 1..00) or* percentages. The use of rates 
or percentages is generally not a good idea when the variables are to be 
correlated, since this creates the attendant problems of index correlation (see 
McNemar, pp. 180-182). A better approach is to use the raw frequency counts, and 
partial out the effects of sample size (Cronbach & Fur by, 1970). Reliability for 
such frequency counts can be computed using analysis of variance, with the group 
size variable used as a covariate. The model in this case differs slightly from 
that shown in Equation (1). The following model defines the structure of the 
data in the case with three levels of hierarchy: 

Y = A + B( A) + C(AB) + D + AD + BD(A) + CD(AB) + E( ABCD ) (13) 
where , 



A 




1, 




B 




1, 


b; 


C 




1, 




D 




1, 


1; 


E 




1, 


1; 



The addition of another crossed term like Race (R) , that is fixed , does not affect 
the reliability definition or formula, so it was omitted. In addition to the 
above model the group size variable can be added as a covariate. The term D oan 
represent either a random dichotomous split, or a dichotomous split that controls 
for a variable like time (e.g., one level represents events that occurred on odd 
numbered days and the other level* events that occurred on even numbered days for 
the time period in question). The split may have to be random when the time 
variable is not available on a case by case basis. The faot that a random split, 
is possible means that an internal consistency reliability can be computed when 
only frequency counts* are available for each group. Researchers often assume it 
is not possible to compute reliability in this case. The reliability definition 
and formula are given as follows; 



(11) 



When random splits within groups are necessary to obtain the observations for the 
term D, greater stability in the reliability estimates can be obtained by a 
jacknife procedure in which MSL, n in Formula (14) is estimated several times using 

different random splits each time. The different estimates can then be averaged 
prior to using the averaged estimate in Formula (14). When the term D is fixed 
the record variable in question is considered to be measured without error and an 
estimate of reliability is not needed. This would occur if (a) the researcher 
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was willing to limit generalizations to that particular variable alone, and (b) 
the frequencies of that variable were a census rather than sample of the relevant 
events. 

Significance Tests 

Difference of Reliability from Zero 

It is important to ask if it is possible to detect a significant amount of 
true variance at all, i.e., is the reliability coefficient significantly differ- 
ent from zero. One form in which this test can be made is to compare total to 
error variance, forming an F ratio, to see if a detectable amount of true 
variance exists. The form of the ? test differs slightly from the reliability 
ratio (true over total variance), but provides a test with the same components. 
The Test definitions and F tests for reliability Formulas 3 through 10 are shown 
in Table 3. The error terms in the denominators of the F ratios in Table 3 can be 
found in different form as the quantity subtracted from MS^ in the numerator of 
the reliability formulas in Table 2. The error terms are expressed in different 
form in Table 3 because tests (17) through (23) are quasi-F tests, i.e.^tests 
involving more than two mean square terms in the F test. In this case, the F test 
is an approximation which is obtained by adjusting che degrees of freedom for 
both the numerator and denominator separately, by the formula given in 
Satterthwaite (1946): 

df adj. = (a^MSJ + a (MSg) + ...) 2 

(24) 

(a/MS^) 2 (a 2 (MS 2 )) 2 ... 

+ — 

4£l ^2 

where, MS_ 1 and MS_ 2 are independent mean squares, and a^ and a_ 2 are the 

coefficients for the mean squares. The mean squares in Table 3 are shown in a 
form that gives separate coefficients for each mean square as required by 
Formula 24. In the case where group size is unbalanced, and the coefficients, 
a., vary from company to company, the quantity & ± MS ± can be obtained most 
accurately by weighting individual scores as appropriate (e.g., Formula 42, as 
described later). 

Difference Between Reliabilities 

In some situations it is important to know whether reliabilities are 
significantly different from each other. For example, using cross-lagged panel 
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correlation (Kenny, 1975), it is important to know whether reliability changes 
over time. When reliability changes, corrections for reliability shifts are 
made. A statistical test for reliability shifts is desirable and can be made 
when the reliabilities are expressed in the form of F ratios as shown previously 
in Table 3,' and the assumption is made that the mean square terms are indepen- 
dent. In the case where measurements are made on group units at more than one 
point in time, with different subjects sampled on each occasion, the samples 
involve the same group populations but different subjects. In analysis of 
variance terms, the measurements are repeated across companies, but not across 
subjects. The mean square terms under these conditions approximate independence. 
The bias due to lack of independence is loss of power. Degrees of freedom are 
large enough so that power is not low in any case. Following Winer (1971, PP. 
245-247), hypotheses related to the equality of two F ratios can be tested as 
follows: 

F, > (F„) (f\ (df numerator, df denominator)) (25) 

where, F, and F c represent reliabilities in the form of F ratios as sr ">wn in 

Table 3;"^ representing the larger F ratio ana Fg the smaller. To obtain F^, 

the degree's of freedom in the numerator and denominator should correspond to. 
degrees of freedom in the numerator and denominator of and F g . The degrees of 

freedom for F^ should approximately equal those for F g for the test to be valid. 

When quasi-F ratios are used , the degrees of freedom for F^ should correspond 

to adjusted degrees of freedom as given in Equation (24). The test should be 
used with some caution with quasi-F ratios. 

Sample Size Requirements 



Organizational research is costly and time consuming. For these reasons, it 
is important to be able to estimate ahead of time the sample sizes needed to 
obtain specified levels of reliability desired by the researcher. How many 
subjects within each group, and how many items in a scale are needed to obtain a 
specified level of reliability, say .75, as measured by the formulas in Table 2? 
Estimates of the mean square terms in Table 2 can be obtained from a pretest 
sample, and from the pretest sample the number of subjects and items that are 
needed for a specified level of reliability can be estimated. 

The way this problem has been solved in the standard case where individuals 
are the unit of analysis, has beer; to estimate the reliability of a single score 
(formula 26, Table 4) which is related to the reliability of the average score 
(Formula 2, Table 2) in terms of tht- Spearman-Brown prediction formula. Solving 
the Spearman-Brown prediction formula for the sample size, tells how many items 
must be added to obtain the desired reliability (see Winer 1971, p. 287). This 
same approach was Used in Table 4 for other formulas. However, when the unit of 
analysis involves a group, the reliability of single scores involves contingen- 
cies: the reliability of a single item given the same number of subjects as was 
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Table 3 



Formula 



Statistical Significance of Reliability Coef ficients c 



Reliability 
b 



Teat Definition 



F Teht 



Formula 
Number 



(15) 



MS 



^3 



(16) 



00 



MS 



(N 3 " sJ/lL, MS S * s/N 5 KS SQ 



(17) 



1 + rsa* rt + 



£33Pq * £2P C Q 



^°C2 + °E 



(18) 



((%- a^^y + ° 



M<v 



(19) 



30 



(20) 



Table 3 (continued) 



Reliability 
formula** 



Test Definition 



F Test" 



Formula 
Number 



rsaoj * a°| + EBP CQ + °S9 4 °K 



SQ 



<N a - 3)/N s MS S - (N fl - »)/N a HS SQ + HS^ 



(21) 



m«c + r^OcQ + q0 l * °SQ + °E 

((^ - a)/^ rso^)) + go* + o^ + oj. 



MS, 



- a"% - % - a>'\ JSgg * 



MSc 



(22) 



10 



rs£Og + rso 2 



eg + a°s + °SQ + °E 



MS, 



-(23) 



a Tho test is for the significance of the reliability coefficient from zero. It is defined in terms of true plus error (total) 
variance over error variance alone. It will answer the question of w/iether it is possible to detect any true /arianoe at all* 
The component o* is assumed zero, MS r « * oi, for Formulas (17) and (19). 
^The numbers refer to the reliability formulas in Table 2« 

c When two or more mean squares are found in the denominator , the F test is an approximation which is obtained by adjusting the 
degrees of freedom for the denominator by Formula 21. 
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Table 1 ^ 

Reliability Formulas for Single Sooree as a Function ofUnitJof Analyses and Sampling Plan 

™ — — 4 « 



Unit of 
Analysis 


Sampling Plan 


Score Estimated 


Reliability Definition 


Formula 

• 


Subjects 


Items random 


Single item 






(S) 


Subjects random 








Companies 


Items random 


Single item/ 


°c 


r% - i%q 


(C) 


Subjects fixed 


subjeots 






Coo panics 


Items fixed 


Single subject/ 







Number 



(26) 



(27) 



Subjeota random items 



Items random 
Subjeots random 

Items random 
Subjeots random 



Single item/ 
subjeots 

Single subject/ 
items 



MS^, ♦ (s - 1) MS g 



MS^ - MS3 - MS^ ♦ HSgg 



(28) 



(29) 



0 J ♦ ♦ rso^ ♦ ♦ oJ)/rs MS^ ♦ (g - 1) MS^ ♦ <a - D MS^ - < a - D MS^ 



- MSg - MS^ ♦ MS^ 



(30) 



Oj ♦ (fl»J ♦ rso^ ♦ o|g ♦ 0* )/ra MS^ ♦ (s - 1) MS^ * Cs - 1) MS^ - (s - 1) MS^ 



Note . All formulas are related to tho corresponding formulas in Table 2 in terms of the Spearman-Brown prediction formula, whioh takes the following 
form for sample size: 



B 1 (1 - Ry) 



whore, equals the relJ ability the researoher wanta, equals the reliability of a single soore as given in this table and n equals sample size required. 
} If a 2 equals the number of items required, s_ 2 equals the number of subjeots required in eaoh group, a, equals the number of items in the pretest, and s y O O 
^4; e< I uals the nuobor of subjeota within groups in the pretest, B » a 2 » given s t ■ s. 2 , or n » j 2 , given a a 2 » 
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found in the protest sample, or the reliability of a single subject given the 
same number of items as found in the pretest questionnaire. Given these 
contingencies, the formulas in Table are related to the corresponding formulas 
in Table 2, in terms of the Spearman-Brown prediction formula. The corresponding 
formulas are those with the same unit of analysis and sampling plan. As shown in 
Table 4, sample size can then be found from the Spearman-Brown formula. Formula 
(28) and the Spearman-Brown formula can be expressed in more convenient form by 
solving (28) in terms of the F ratio, F = M^/WSg, and substituting this, into the 

Spearman -3rown formula. The number oT subjects needed in each group (^ 2 ) can 
tnen be found as follows: 



(3D 



s = 

I ( 1 - Ry) + R w - 1 



where, equals the reliability desired, £ 1 the sample size in each pretest 
group and F = MS^, / MS g . 

The problsm with using formulas (27) through (3D to estimate sample size 
requirements is that the number of subjects needed (s 2 ) can only be estimated, 

given that the number of items to be used in the final questionnaire (£ 2 ) equals 

the number of items (g+) in the pretest sample. The number of items needed in the 

questionnaire (g, 0 ) can only be estimated, given that the number of subjects to be 

used in the final sample (s^) equals the number used in the pretest (s^). Also, 

if the unit of analysis is at a higher level than companies, the pretest sample 
must be assumed to have the same subordinate group structure as in the fi.ial 
sample. Another serious problem is that the preceding approach does not work for 
some formulas — when subjects or items are semirandom. There are problems with 
the concept of a single-score reliability in the semirandom case. 

The sample size requirement problem was solved for all formulas without any 
contingencies, by estimating variance components from pretest data independently 
of the number of subjects or items in the pretest, substituting the sample sizes 
desired, s_ 2 , a 2 , for pretest coefficients s^ and ^ , where they appeared in the 

reliability definitions, and then solving for s_ 2 and c^. The required formulas 

are shown in Table 5. From Table 5, the number of subjects or items required for 
any formula in Table 2 can be estimated from pretest data without any contingen- 
cies. For example, a researcher can estimate the number of subjects required 
(s 2 ) , given that X number of items are added to a scale over what existed in the 

pretest. Similarly, the number of items (c^) can be estimated, given that the 

sample size within each group in the final sample is larger than it was in the 
pretest. Of course, the assumption is made that the items that are added are 
intercorrelated together to the same degree as pretest items above, and subjects 
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Table 5 

Formulas for Determining Sample-Size Requirements 

.from Pretest Data 



Reliability 
Formula 



Sample Size Formulas 
Number of Subjects - Number of Items 



MS„ (1 - R) - MS S (1 - R^) - MSg, ( 1 



Formula 
Number 



~ a 
Defining R 














\ 

9 . 


3 


S 2 — — 












• 


14 


s 2 = A/ (C 


+ H) 










(33) 


5 






a 2 


= B/D 






(3*0 


6 






a 2 


= B/(D + 


I) 




(35) 


7 


s 2 + A/(E 


- £) 


a 2 


= B/(E - 


F) 




(36) 


8 


s 2 = A/(E 


- G + H) 


a 2 


= B/(E - 


F + 


H) 


(37) 


9 


s 2 = A/(E 


- G + I) 


a 2 


= B/(E - 


F + 


I) 


(38) 


10 


s 2 = A/(E 


- G + H + I) 


a 2 


= B/(E - 


F + 


H + I) 


(39) 


Note. A = 




- (32 " Vfl2 
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Table 5 (continued) 

R is the value of the reliability that the researcher wants to obtain in a new,* 
sample. The symbol s 2 ref*ers to the number of subjects within each group that is 
needed to obtain the desired reliability R^, v lile is the pretest sample size 
within each group. Similarly-, refers to the number of items needed to obtain 
the stated R^, while &j is the number of 'items in the pretest. N g is the 
population size within each company, while is the size of the population of 
items. The mean square terms are -based on the pretest data using the. original 
model given in Formula (1). The assumption that a| Q = 0 must be made for Formu- 
las (32) , (33) , (34), and (35). When A or'B is the unit of analysis MS_ A or MSg is 
substituted for MS^,, and MS_ AQ or MS_ BQ for MS CQ . 

a The numbers refer to the reliability formulas found in Table 2. 
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added discriminate between groups to the same extent as in the pretest. The 
Formulas in Table 5 can be used for any of the units of analysis A, B or C, 
without contingencies, using the appropriate substitutions given in this table. 

Adding items to a survey scale will increase reliability as defined by 
Formulas (3) and (4), only to a limited extent (i.e., increasing the coefficients 
of a 2 and a 2 in relation to ai ) , and likewise increasing the number of subjects 

will"~increase Formulas (5) and"(6) only to a limited extent (i.e., increasing the 
coefficients of O 2 , and a 2 Q in relation to a 2 ). Therefore, it is not meaningful 

to solve the equations for items for Fo7mulas (32) and (33), or for subjects 

(s 2 ) for formulas (34) and (35). Negative estimates from any of the formulas in 

Table 5 mean an infinity of subjects or items would be needed to obtain the 
requisite reliability, i.e., the desired level of reliability can't be obtained 
by adding to the sample size. 



Unbalanced Designs 



Effects on Formulas . 

The derivation of all the previous formulas has been based on the assumption 
of a oalanced design, i.e., equal sample and group sizes across levels of all 
factors. This, of course, Wely occurs in intact organizations that are of 
interest here. The impact of unbalanced designs on the expected mean squares, 
for the model at Equation (1), is shown in Table 6. When balanced formulas are 
used to calculate the mean squares for the model at Equation 1 when the^model is 
not balanced, the resulting mean squares contain elements of variance components 
from a variety of extra terms. A comparison of Table 6 and 1 shows additional 
components or elements of these components, added by unbalance. How the 
confounding is handled depends entirely on the hypotheses being tested. For 
purposes of reliability estimation, researchers do not wish to generalize to 
hypothetical organizations in which groups are all the same size, with equal 
numbers of, say, blacks anrt whites in each. Such a balanced hypothesis is 
clearly irrelevant and inappropriate for intact organizations. Generalizations 
are made to the intact organization where subgroups vary. In the intact 
organization the crossed term Raoe (R) and the subordinate hierarchical terms 
BCA), and C(AB) are fixed. When these terms are all fixed, it is appropriate to v 
consider alfconfounded elements added by imbalance to the "between people" 
components of MS_ A , MSg, or MS^, as true variance, since. that sort of confounding 
exists naturally~in the intact organization to which generalizations are being 
made. However, when questionnaire items (Qrare considered random, all confound- 
ed elements added- by unbalance to the "within .people" components of MS_ A , MSg or 
US can best be considered * error . These confounded elements all represent 

interactions with the random term Q. Since Q is random, items change from one 
sample >to another, and so would interactions with Q, which suggests these 
confounded elements should be considered error. When the preceding allocation of 

24 



Table 6 

Unbalanoed Expooted Mean Squares 



Model 


Between People 


Within People 


Terms 


A 


B(A) 


C(AB) 


R 


AR 


Bfl(A) 
/ 


CR(AB) S(ABCR) 


2 




BQ(£> 


ca<AB) 


m 


ARQ 


brq(a) crq(ab) s9(abcr) 


A 


abo 


AO 


ao 


abu 


abo 


— f 

abo 


abo 


abo 




ab 


a 


a 


ab 


ab 


ab 


ab 


abo 






abo 


ao 


abo 


abo 


abo 


abo 


, abo 






ab 


a 


ab 


ab 


ab 


ab 


abo 


C(AB) 






abo 


e.bo 


abo 


abo 


abo 


abo 








ab 


ab 


ab 


ab 


ab 


abo 


S(ADCR) 
















abo 


















abo ' 


AQ 




















abo 


ao 


ao 


abo 


abo 


abo 


abo 


abo 


BQ( A) 






















abo 


ao 


abo 


abo 


abo 


abo 


abo 


cq(ab> 
























;*bc 


abo 


abo 


abo 


abo 


abo 


SQ(ABCR) 


































aba 



Note. The model is baaed on Equation 1. The expected mean squares for the terms at left are found in the unbalanced oese by looking along the row.y 
for common letters that represent the following conditions: (a) oonfounding Between Groups, oonfounding with Raoe (R, ) , 2 random} (b) no confounding 
Between Croups, oonfounding with «aoe (R) , fi random; (o) oonfounding Between Groups, oonfounding with Raoe (R). g fixed, la eaoh case Subjeots (S) 
la considered random. 
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confounded elements is made between true and error variance, the reliability 
formulas, tests, and sample size requirements given previously in Tables 2, 3, 4 
and 5 remain unchanged. However, it should be recognized that reliability and 
test definitions contain additional confounded elements as shown in Table 6. 

An additional problem remains for hypothesis testing with unbalanced de- 
signs. Mean square terms are no longer independent — an assumption required for 
numerators and denominators of F tests. Tests should be made with caution when 
unbalance is severe. This problem is not unique to reliability estimation, and 
is frequently encountered in unbalanced analysis of variance designs* 



Weighting Scores 

Unbalanced designs and sampling requirements often necessitate weighting 
individual scores in order to appropriately estimate reliability. Since sample 
size affects reliability, as shown previously, weights must be applied In a. manner 
that does not affect the total sample size. Weights are appropriate in the 
following three situations. 

First, using a stratified sampling plan, the crossed term Race (R) might not 
be sampled in proportion to company racial populations. Blacks might be sampled 
at a higher rate in order to get a sufficient minority sample size. When 
estimating a total company score, ignoring race, the individual scores within 
each company need to be weighted to estimate what would have been obtained 
without disproportionate sampling. In this case the individual scores within 
each company are weighted according to the following formula: 




(40) 



where, W„ represents the weight for black subjects in company i, N„ and N„ 

represent, respectively, the black and total population sizes in company i, and 
n and n_ represent, respectively, the black and total survey sample sizes. To 
"fi -±i 

obtain the weight for white subjects in company i, 1L. and rv, representing, 

-i -i 

respectively, the population and sample sizes of whites in company i are 
substituted to replace N- and n„ in Formula kO. 

"Hi ~£i 

A second reason for weighting individual scores is to insure that the units 
of analysis are weighted equally. Since each unit, as a data point, is weighted 
equally when used in correlation or other statistics, each unit should be 
weighted equally when estimating reliability. Typically, er il sample sizes are 
obtained from groups at t.e level intended for use as the unit of analysis, 
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providing equal weights. However, weights equal at this level will not be equal 
at another level when hierarchical levels are oonfounded. Furthermore, a simple 
random sample may have been used which will produce unequal weights when group 
sizes differ. In these cases, individual scores within each group or company are 
weighted as follows: 

- 1 . "I (41) 
W s - • - 

% Si 

where W, is the weight given individual responses within each company , and 
represent, respectively, the population and sample size for company i, and N, T and 
n T represent, respectively, the population and sample totals for all companies 
combined . 

A third reason for weighting individual soores, is to accurately estimate 
the error terms in Table 2 when subjects are considered semirandom (Formulas 4, 8 
and 10). Each unit should be weighted equally in terms of sample size, but the 
company population sizes are unlikely to be equal also. That means the sampling 
term (N - s) / N found in Table 2 will differ from company to company. In order 

to accurately estimate the error terms MSg and MSg Q for these semirandom 

formulas, individual scores within each company should be weighted as follows: 

where, W, equa" 3 the weight in each company and N and s^ represent, respective- 

— 1 ' x 

ly, the population and sample sizes in each company. MS g and MS gQ , obtained from 

scores weighted by (42) are substituted in Formulas 4,3, and 10 to replace the 
corresponding terms that are multiplied by (^ - s) / Ng. The other means square 

terms are estimated without weighting. 

The three types of weighting given in Formulas (40), (41), and (42) may be 
used separately or together in any combination as appropriate. The weights given 
in (40) and (41) maintain the original sample sizes as required. 



Synchronization Measures 



Making the Measures Comparable 

Synchronization measures, are shown in Table 7. These measures are used for 
selecting a unit of analysis. High synchronization for a unit pinpoints the 
level of the organization that exercises responsibility and control over the 
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Table 7 

Synchronization Measures for Determining 
the Unit of Analysis 



Unit of Analysis Synchronization Definition Formula 



a Formula Number 



Companies 














<«3) 


(C) 






g')/rsq 


!% 






Battalions 








«%- 


!% 




(It) 

* 


(B) 




(*»§ + 


a*)/rsa 


!% + 


(o - 


DMSg 


Brigades 








—A " 






(45) 


(A) 




(a"s + 


a|)/rsa 




(bo 





Subjects are considered random and items fixed. Formulas W) and W) 
differ from reliability formulas by an adjustment which makes the number of 
subjects within Brigades (A) and Battalions (B) hypothetical^ equal, for 
purposes of comparison, to the numbers within each company (C). 
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subject matter represented by the scale. These measures provide a way of 
directly comparing the extent of synchronization at each level of the hierarchy, 
A, B and C. At each level of hierarchy the number of subjects within the unit of 
analysis Increases. Increases in subjects also increases l-eliability as measured 
by Formula 3» Reliability as measured by Formula 3 is again used as a 
synchronization measure, but only for the lowest level in the hierarchy— in this 
case for Companies (C). The synchronization definitions and formulas for the 
higher levels of hierarchy B and A are adjusted statistically so that they have 
the same number of subjeots within~groups at the higher levels as was found at the 
lowest level C. With this adjustment, the synchronization measures all become 
directly comparable. If a comparison of Battalion ( B) and Brigade ( A) 
synchronization is desired by itself, ignoring Companies (C), the sample size 
adjustment can be made on Brigades, making Brigades equal in size to the level 
just below, Battalions , as follows: 

Sg = (MSg - ' !% (W 

S A = (MS A - MSg) / (MS A + (b - 1) MS S ) (47) 

where, S_ equals synchronization for Battalions, and S_ A synchronization for 
"~B _ 

Brigades. 



Significance of Difference Between Measures 

With Formulas (43) to (45), the degree synchronization can be compared 
directly for each level of hierarchy, to determine the best unit of analysis. 
Finally, whether svnohronization at one level is significatly greater than 
synchronization at another can be tested by forming appropriate quasi-F ratios as 
shown in Table 8. Each of the synchronization measures shares a common "error" 
term, MS_, which is ignored when comparing relative sizes of synchronization 

measures 7 because it is held in common. Independent mean squares are needed for 
F ratios. Comparing synchronization can be accomplished by comp*Hng the 
relative sizes of the "total" variance that has been adjusted for equal group 
sizes ignoring MS- for the reason stated. Company synchronization is compared to 

Battalion and Brigade synchronization in Formulas (48) and (49), and Battalion 
to Brigade in (50). For the latter comparison, Brigade size is adjusted to equal 
Battalion size in order to get a test with independent mean squares in the 
numerator and denominator of the F test. Power is greater for the test in Formula 
(50) than for the tests in (48) and (49). 

When the hierarchical levels A, B and C are confounded, individual scores 
may need to be weighted by Formula (41), to insure that each unit of analysis is 
weighted equally. The weights, when needed, will change as confounded hierarchi- 
cal levels change. The coefficients c and be in Formulas (44) and (45) are 
averages when the terms A, B, and C are confounded and weights are used. When 
different weights are applied at different hierarchical levels in a confounded 
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Table 8 

Significance of Differences Between Synchronization Measures 



Comparison Test Definition F Test" Number 



Companies (C) / 


2™?C + S^s + °E 






(48) 


Battalion (B) 


SL£scj| + ga* + a| 


MS B + (c - 


1) MSg 


Companies (C) / 


grsa£ + gag + o| 


to MS,, 




(49) 


Brigade (A) 


ar^A + aa s + a E 


MS. + (be ■ 
— A — 


- 1 > i% 


Battalion (B) / 


cqrsa* + ga* + o* 






(50) 


Brigade (A) 


cqrsa* + ga| + a* 


MS, + (b - 





Note. Formula (48) as written tes^s whether company synchronization is 



greater than battalion synchronization. The numerator and denominator can 
be reversed to test whether battalion synchronization is greatest. 

a Degrees of freedom for quasi-F tests are found by referring to Formula (24) 
in the text. 
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design, the mean squares in tl:e numerator and denominator of the ^receding tests 
are no longer independent, so that testing the significance of the difference 
between synchronization measures in this case should be used with caution. 



Removing Synchronization 

When synchronization is found at more than one level of the hierarchy, the 
synchronization at the higher level can be partialed out using dummy regression, 
if desired. The existence of synchronization at each level can be tested by 
applying Formula (16) at each level of hierarchy to see if significant "true" 
variance can be identified at each level. The power of the test in Formula (16) 
is higher at higher levels. The number of degrees of freedom remaining after a 
higher-level group is partialed out may be reduced sharply as a result of 
removing synchronization. Removing synchronization from higher levels, however, 
would leave the researcher with results that could be unambiguously attributed to 
the lower-level unit and its leaders. Depending on hypotheses, this might be a 
desirable or an artificial result. It is possible, however, to statistically 
eliminate synchronization from higher levels when desired. 



Computational Requirements 

There are two primary difficulties in computing the reliability and synchro- 
nization measures and tests giver in this paper. The most serious difficulty is 
the computer core space required to compute a large split-plot analysis of 
variance design. All of the commonly used general analysis of variance packages, 
including. SAS, RUMMAGE , BMP , MULTIVARIATE , and IMSL , greatly exceed the core 
limitations of virtually all computers, for even modestly sized split-plot 
designs that involve even a moderate number of subjects. As the number of 
subjects in a split-plot design increase, factors that include subjects beoome 
huge. Commonly used analysis of variance packages attempt to store these huge 
factors in core. One exception is BMDP2V program, whioh does not require an 
unreasonable amount of core, but cannot compute the hierarchical portion of the 
design. Only c:ie lavel in a hierarohy is possible. A general analysis of 
varices program capable of analyzing any design, was written to compute reliabi- 
lities for aggregated scores. The input data was organized by sorting to 
alleviate *he cell storage problems. Multiple sorts are required for one run on 
a given mcuC-1, but a large number of reliabilities can be computed during a 
single run. 

The amount of computer CPU time taken to compute these reliabilities is a 
second problem. Most general analysis of variance packages create dummy vari- 
ables to calculate either balanced or unbalanced designs, but in split-plot 
designs the number of dummy variables required is often huge, requiring large 
amounr-f of computer time. Th« general analysis of variance program that was 
written for computing reliabilities, uses the balanced algorithm given previous- 
ly. The balanced algorithm is appropriate for unbalanced data when confounded 
components in an unbalanced design are allocated between true and eiror variance, 
^as outlined previously. The algorithm was modified slightly in order to make the 
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algebraNappropriate in the unbalanced as well as the balanced case. Looking back 
at the steps required to get sums of squares, step (c) follows immediately after 
step (a) when applied to the unbalanced oase. Degrees of freedom are obtained by 
getting the sum of the cells associated with main effects that are listed, 
instead of the produot of the levels of the main effects listed, as given for the 
balanced case (see P. 4). The balanoed algorithm in this program computes 
reliabilities much more rapidly than do programs that generate dummy variables. 
Multiple sorts on input data do, however, take some 1-0 ("wall clock") time, but 
this is required to alleviate the more serious core storage problems. 

* 

I Summary 



When research is conducted with intact organizations, groups rather than 
individuals are used frequently as the unit of analysis. One advantage of using 
groups as units is that, in this case, interaction within these groups can be 
studied. If groups are selected as the unit of analysis, what level of the 
organizational hierarchy should be selected for study? A statistical technique 
is suggested for selecting groups at the most appropriate level of the organiza- 
tional hierarchy, at a level that actually controls and is responsible for the 
subject master. This technique measures the extent of synchronization within 
groups at different levels of the hierarchy. The level selected for the unit 
should generally be the level with greatest synchronization. 

After selecting an appropriate group unit of analysis, how should reliabili- 
ty be estimated? Survey variables consist of scores aggregated over both 
subjects within groups and survey items. The trad/Ltional methods of estimating 
reliability are either»incomplete or inappropriate when applied to estimating the 
reliability of these aggregated scores. Using analysis of variance, appropriate 
reliability formulas were derived that depend on both the unit of analysis and 
survey sampling plan. In addition, significance tests for these reliabilities 
were given, as well as formulas to determine sample-size requirements from 
pretest data. A technique for estimating the reliability of record data, in the 
form of frequency counts within groups, is also given. Together, these statisti- 
cal techniques provide improved methods for studying the operation of organiza- 
tions . 



information about the availability of this computer program may be obtained by 
writing the authors at Army Research Institute Field Unit, P.O. Box 5787, 
Presidio of Monterey, CA 939^0. The program has been written so that it is easy 
to use with simple model input statements. Implementation on different computers 
could pose problems, depending on tha extent to which the program is given 
continued attention and development by the authurs. 
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